Goto

Collaborating Authors

 ml api






Automatic and Efficient Customization of Neural Networks for ML Applications

Liu, Yuhan, Wan, Chengcheng, Du, Kuntai, Hoffmann, Henry, Jiang, Junchen, Lu, Shan, Maire, Michael

arXiv.org Artificial Intelligence

ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API. However, these APIs offer the same pre-trained models regardless of how their output is used by different applications. This can be suboptimal as not all ML inference errors can cause application failures, and the distinction between inference errors that can or cannot cause failures varies greatly across applications. To tackle this problem, we first study 77 real-world applications, which collectively use six ML APIs from two providers, to reveal common patterns of how ML API output affects applications' decision processes. Inspired by the findings, we propose ChameleonAPI, an optimization framework for ML APIs, which takes effect without changing the application source code. ChameleonAPI provides application developers with a parser that automatically analyzes the application to produce an abstract of its decision process, which is then used to devise an application-specific loss function that only penalizes API output errors critical to the application. ChameleonAPI uses the loss function to efficiently train a neural network model customized for each application and deploys it to serve API invocations from the respective application via existing interface. Compared to a baseline that selects the best-of-all commercial ML API, we show that ChameleonAPI reduces incorrect application decisions by 43%.


What Kinds of Contracts Do ML APIs Need?

Khairunnesa, Samantha Syeda, Ahmed, Shibbir, Imtiaz, Sayem Mohammad, Rajan, Hridesh, Leavens, Gary T.

arXiv.org Artificial Intelligence

Recent work has shown that Machine Learning (ML) programs are error-prone and called for contracts for ML code. Contracts, as in the design by contract methodology, help document APIs and aid API users in writing correct code. The question is: what kinds of contracts would provide the most help to API users? We are especially interested in what kinds of contracts help API users catch errors at earlier stages in the ML pipeline. We describe an empirical study of posts on Stack Overflow of the four most often-discussed ML libraries: TensorFlow, Scikit-learn, Keras, and PyTorch. For these libraries, our study extracted 413 informal (English) API specifications. We used these specifications to understand the following questions. What are the root causes and effects behind ML contract violations? Are there common patterns of ML contract violations? When does understanding ML contracts require an advanced level of ML software expertise? Could checking contracts at the API level help detect the violations in early ML pipeline stages? Our key findings are that the most commonly needed contracts for ML APIs are either checking constraints on single arguments of an API or on the order of API calls. The software engineering community could employ existing contract mining approaches to mine these contracts to promote an increased understanding of ML APIs. We also noted a need to combine behavioral and temporal contract mining approaches. We report on categories of required ML contracts, which may help designers of contract languages.


HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions

Chen, Lingjiao, Jin, Zhihua, Eyuboglu, Sabri, Ré, Christopher, Zaharia, Matei, Zou, James

arXiv.org Artificial Intelligence

Commercial ML APIs offered by providers such as Google, Amazon and Microsoft have dramatically simplified ML adoption in many applications. Numerous companies and academics pay to use ML APIs for tasks such as object detection, OCR and sentiment analysis. Different ML APIs tackling the same task can have very heterogeneous performance. Moreover, the ML models underlying the APIs also evolve over time. As ML APIs rapidly become a valuable marketplace and a widespread way to consume machine learning, it is critical to systematically study and compare different APIs with each other and to characterize how APIs change over time. However, this topic is currently underexplored due to the lack of data. In this paper, we present HAPI (History of APIs), a longitudinal dataset of 1,761,417 instances of commercial ML API applications (involving APIs from Amazon, Google, IBM, Microsoft and other providers) across diverse tasks including image tagging, speech recognition and text mining from 2020 to 2022. Each instance consists of a query input for an API (e.g., an image or text) along with the API's output prediction/annotation and confidence scores. HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS). As examples of the types of analyses that HAPI enables, we show that ML APIs' performance change substantially over time--several APIs' accuracies dropped on specific benchmark datasets. Even when the API's aggregate performance stays steady, its error modes can shift across different subtypes of data between 2020 and 2022. Such changes can substantially impact the entire analytics pipelines that use some ML API as a component. We further use HAPI to study commercial APIs' performance disparities across demographic subgroups over time. HAPI can stimulate more research in the growing field of MLaaS.


Did the Model Change? Efficiently Assessing Machine Learning API Shifts

Chen, Lingjiao, Cai, Tracy, Zaharia, Matei, Zou, James

arXiv.org Artificial Intelligence

Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it is often not clear to the user if and how the ML model has changed. Model shifts can affect downstream application performance and also create oversight issues (e.g. if consistency is desired). In this paper, we initiate a systematic investigation of ML API shifts. We first quantify the performance shifts from 2020 to 2021 of popular ML APIs from Google, Microsoft, Amazon, and others on a variety of datasets. We identified significant model shifts in 12 out of 36 cases we investigated. Interestingly, we found several datasets where the API's predictions became significantly worse over time. This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant. Monitoring confusion matrix shifts using standard random sampling can require a large number of samples, which is expensive as each API call costs a fee. We propose a principled adaptive sampling algorithm, MASA, to efficiently estimate confusion matrix shifts. MASA can accurately estimate the confusion matrix shifts in commercial ML APIs using up to 90% fewer samples compared to random sampling. This work establishes ML API shifts as an important problem to study and provides a cost-effective approach to monitor such shifts.


5 Ways to Advance Your Machine Learning Initiatives

#artificialintelligence

There is no doubt that AI (artificial intelligence) is the new electricity and everyone is trying to get benefits from the trend. Many companies are integrating AI solutions in their business operations to reap the benefit of emerging machine learning (ML) technologies. The seamless introduction of AI, however, requires thoughtful adaptation of corporate strategy to requirements of this emerging technology. As a partner at a venture studio, I see companies try to get in the trenches of machine intelligence without the proper preparation. Here's what we recommend companies to advance their initiatives effectively.


Try out Machine Learning services on SAP Cloud Platform

#artificialintelligence

SAP Leonardo Machine Learning Foundational APIs have been recently made available in the trial landscape. Anyone can register for a trial account and test drive these ML APIs. In this blog, I want to quickly show you how you get started using the ML APIs to works with the pre-trained models. At SAPPHIRE, the Machine Learning team also announced a set of new pre-trained and customizable services for Face detection, Scene text recognition etc. In the blog, I would like to focus on Scene text recognition which will enable to read text from natural images/scenes.